Re: [Qemu-devel] [PATCH 2/9] S/390 CPU emulation

qemu-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 2/9] S/390 CPU emulation

From:	Ulrich Hecht
Subject:	Re: [Qemu-devel] [PATCH 2/9] S/390 CPU emulation
Date:	Mon, 19 Oct 2009 19:17:18 +0200
User-agent:	KMail/1.9.10
On Saturday 17 October 2009, Aurelien Jarno wrote:
> On Fri, Oct 16, 2009 at 02:38:48PM +0200, Ulrich Hecht wrote:
> First of all a few general comments. Note that I know very few things
> about S390/S390X, so I may have dumb comments/questions. Also as the
> patch is very long, I probably have missed things, we should probably
> iterate with new versions of the patches.
>
> Is it possible given the current implementation to emulate S390 in
> addition to S390X?

Not as it stands. The S/390 instruction set is a subset of zArch 
(64-bit), but currently only 64-bit addressing is implemented (because 
that is the only mode used by 64-bit Linux binaries).

> If yes how different would be a S390 only target? 

It would need support for 31-bit addressing. 24-bit, too, if you want to 
run the really old stuff.

> If they won't be too different, it probably worth using _tl type
> registers and call it target-s390. A bit the way it is done with
> i386/x86_64, ppc/ppc64, mips/mips64 or sparc/sparc64.

If it's going to be implemented, it will certainly share most code with 
the 64-bit target, so having a common directory would most likely be a 
good idea.

There are some cases where _tl types might be appropriate, but generally 
everything that is shared between S/390 and zArch is 32-bit, and 
instructions present in both architectures also have the same operand 
size: L, for instance, is a 32-bit load to a 32-bit register on both 
architectures.

> Secondly there seems to be a lot of mix between 32-bit and 64-bit
> TCG registers. They should not be mixed. _i32 ops apply to TCGv_i32
> variables, _i64 ops apply to TCGv_i64 variables, and _tl ops to TGCv
> variables. TCGv/_tl types can map either to _i32 or _i64 depending on
> your target. You should try to build the target with
> --enable-debug-tcg

Er, reading the code and observing the behavior of the (AMD64) code 
generator, it seemed to me that neither frontends nor backends care one 
bit about that. I'll look into it, though. (I won't comment on those 
below, except in cases where the solution or, indeed, the problem is not 
clear to me.)

> It would be nice to split all the disassembling part in a separate
> combined with the related makefile changes. This way it can be applied
> separately.

Can do.

> >  //  switch (info->mach)
> >  //    {
> >  //    case bfd_mach_s390_31:
> > -      current_arch_mask = 1 << S390_OPCODE_ESA;
> > +//      current_arch_mask = 1 << S390_OPCODE_ESA;
> >  //      break;
> >  //    case bfd_mach_s390_64:
> > -//      current_arch_mask = 1 << S390_OPCODE_ZARCH;
> > +      current_arch_mask = 1 << S390_OPCODE_ZARCH;
> >  //      break;
> >  //    default:
> >  //      abort ();
>
> While I understand the second part, Why is it necessary to comment the
> bfd_mach_s390_31 case?

It's not necessary, but current_arch_mask can only hold one value...

> > +typedef union FPReg {
> > +    struct {
> > +#ifdef WORDS_BIGENDIAN
> > +        float32 e;
> > +        int32_t __pad;
> > +#else
> > +        int32_t __pad;
> > +        float32 e;
> > +#endif
> > +    };
> > +    float64 d;
> > +    uint64_t i;
> > +} FPReg;
>
> WORDS_BIGENDIAN is wrong here. It should probably be
> HOST_WORDS_BIGENDIAN.

Indeed.

> Also it may be a better idea to 
> reuse CPU_FloatU and CPU_DoubleU here.

Yes. Didn't know about those.

> > +CPUS390XState *cpu_s390x_init(const char *cpu_model)
>
> This function would probably be better in translate.c, so that
> s390x_translate_init() can be declared static.

I'll check that.

> Please use /* */ for comments (See CODING_STYLE).

OK.

> > +    /* FIXME: reset vector? */
>
> Yes, reset vector, MMU mode, default register values, ...

Works perfectly fine without. ;)

Also, the question remains what kind of reset cpu_reset() is supposed to 
be. For instance, a CPU Reset or Initial CPU Reset on S/390 leaves the 
general-purpose registers unchanged, a Clear Reset or Power-On Reset 
sets them to zero. See the table on p. 4-50 in the zArchitecture 
Principles of Operation (POP), 
http://publibfp.boulder.ibm.com/cgi-bin/bookmgr/download/A2278324.pdf, 
for the gory details.

> > +//#define DEBUG_HELPER
> > +#ifdef DEBUG_HELPER
> > +#define HELPER_LOG(x...) qemu_log(x)
> > +#else
> > +#define HELPER_LOG(x...)
> > +#endif
>
> Small comment for the rest of the file. While I understand HELPER_LOG
> is very handy while developing, I think some calls to it can be
> dropped in really simple functions.

Yes, a bit of cleanup will be in order.

> > +    for (i = 0; i <= l; i++) {
> > +        x = ldub(dest + i) & ldub(src + i);
> > +        if (x) cc = 1;
>
> coding style

Maybe you are referring to "Every indented statement is braced; even if 
the block contains just one statement.", but I figured that that does 
not apply here because there is no indentation...

> > +    if (v < 0) return 1;
> > +    else if (v > 0) return 2;
>
> coding style

Same here.

> The last 6 helpers can probably be coded easily in TCG (they are very
> similar to some PowerPC code), though merging this patch with the
> helpers is not a problem at all.

Again, that would be detrimental to performance. I have experimented with 
that and found that TCG is well capable of optimizing out a pure helper 
call, but not a complex piece of code with branches. Condition codes are 
not used most times, and the overhead of calling a helper in the cases 
they are is far smaller than doing all the condition code stuff that is 
not used. Also reduces code size.

> All the branch part would really gain to be coded in TCG, as it will
> allow TB chaining.

I vaguely remember trying that as well, but I don't know if I gave up on 
it because it was slower, or because I couldn't get it to work...

> > +            }
> > +            else if (r > d) {
>
> coding style

OK, this one is valid. :)

> __uint128_t is probably not supported on all hosts/GCC versions.

Definitely does not work on 32-bit hosts.

> mulu64() should be used instead.

Excellent, didn't know about that either.

> Same here, __uint128_t should not be used, though I don't know what
> should be used instead.

Hmmmm...

> > +/* unsigned string compare (c is string terminator) */
> > +uint32_t HELPER(clst)(uint32_t c, uint32_t r1, uint32_t r2)
> > +{
> > +    uint64_t s1 = env->regs[r1];
> > +    uint64_t s2 = env->regs[r2];
> > +    uint8_t v1, v2;
> > +    uint32_t cc;
> > +    c = c & 0xff;
> > +#ifdef CONFIG_USER_ONLY
> > +    if (!c) {
> > +        HELPER_LOG("%s: comparing '%s' and '%s'\n",
> > +                   __FUNCTION__, (char*)s1, (char*)s2);
> > +    }
> > +#endif
>
> Why CONFIG_USER_ONLY ?

s1 and s2 are target addresses and not valid on the host system in the 
SOFTMMU case, so this would segfault.

> > +/* union used for splitting/joining 128-bit floats to/from 64-bit
> > FP regs */ +typedef union {
> > +    struct {
> > +#ifdef WORDS_BIGENDIAN
> > +        uint64_t h;
> > +        uint64_t l;
> > +#else
> > +        uint64_t l;
> > +        uint64_t h;
> > +#endif
> > +    };
> > +    float128 x;
> > +} FP128;
>
> WORDS_BIGENDIAN is wrong here. CPU_QuadU can probably be used instead.

Agree.

> > +    if (float32_is_neg(v2)) {
> > +        v1 = float32_abs(v2);
> > +    }
> > +    else {
> > +        v1 = v2;
> > +    }
>
> I don't see the point of such a test here.

Now that you mention it, I don't really see it either.

> > +    v2.i = ldl(a2);
>
> The value should be passed directly instead of loaded here, as ldl is
> is wrong depending on the MMU mode.

Yup.

> > +#ifdef WORDS_BIGENDIAN
>
> This is wrong. It should probably be HOST_WORDS_BIGENDIAN.

I concur.

> > +static TCGv load_reg(int reg)
> > +{
> > +    TCGv r = tcg_temp_new_i64();
> > +#ifdef TCGREGS
> > +    sync_reg32(reg);
> > +    tcg_gen_mov_i64(r, tcgregs[reg]);
> > +    return r;
> > +#else
> > +    tcg_gen_ld_i64(r, cpu_env, offsetof(CPUState, regs[reg]));
> > +    return r;
> > +#endif
> > +}
>
> I don't really like implicit TCGv temp allocation. In other targets it
> often has caused missing tcg_temp_free().

I did in fact run into problems with this and was thus forced to make 
sure that no tcg_temp_free()s are missing... :)

> It should probably be 
> rewritten as load_reg(TCGv t, int reg).

Later. This is the kind of change that you never get right on the first 
try, so I'd like to do that after everything went upstream.

> For all register load/store, as already explained in the TCG sync op
> patch, I am in favor of using the ld/st version (TCGREGS not defined).

It doesn't perform.

> > +static void gen_illegal_opcode(DisasContext *s)
> > +{
> > +    TCGv tmp = tcg_temp_new_i64();
> > +    tcg_gen_movi_i64(tmp, 42);
>
> tcg_const_i64 could be used instead.

Yes. Using the actual value for a specification exception instead of 42 
might be a good idea as well. :)

>
> > +    gen_helper_exception(tmp);
>
> Missing tcg_temp_free_i64(tmp);

OK, so I missed _that_ one. Fine. Be like that if you want to. Illegal 
instructions are not that frequent anyway...

> > +    gen_helper_cmp_s32(cc, v1, tcg_const_i32(v2));
>
> The TCG passed to the helper should be freed.

And that one...

> > +    gen_helper_cmp_u32(cc, v1, tcg_const_i32(v2));
>
> Same.

And that one... :(

> > +    gen_helper_cmp_s64(cc, v1, tcg_const_i64(v2));
>
> Same

And that one... :((

> > +    gen_helper_cmp_u64(cc, v1, tcg_const_i64(v2));
>
> Same

And that one... :(((

> The branches should be handled using brcond and goto_tb and exit_tb
> and not with helpers, to allow TB chaining.

I'll look into that, but I guess it's not entirely trivial.

> > +    TCGv tmp = 0, tmp2 = 0, tmp3 = 0;
>
> This is wrong. 0 maps to global 0 (env in your case). -1 should be
> used instead, or even better the TCGV_UNUSED macro.

OK.

> > +    case 0x12: /* LT R1,D2(X2,B2) [RXY] */
> > +        tmp2 = tcg_temp_new_i32();
>
> Wrong TCGv type.
>
> > +        tcg_gen_qemu_ld32s(tmp2, tmp, 1);
>
> tcg_gen_qemu_ld32s loads a 32 bit value in the default size register,
> that is 64-bit here.

Actually, this a purely 32-bit insn, so it should be tcg_gen_qemu_ld32() 
in the first place.

> > +            tcg_gen_qemu_ld32s(tmp2, tmp, 1);
> > +            tcg_gen_ext32s_i64(tmp2, tmp2);
>
> The sign extension is already done by qemu_ld32s

Just to be on the safe side. :)

> > +        switch (op) {
> > +        case 0x8: case 0x18: gen_set_cc_add64(tmp, tmp2, tmp3);
> > break; +        case 0xa: case 0x1a: gen_helper_set_cc_addu64(cc,
> > tmp, tmp2, tmp3); break; +        default: tcg_abort();
>
> tcg_abort() is wrong here, it is for internal TCG use. Also all the
> real CPU it probably launch an illegal instruction exception, it does
> not power off the machin.

Actually, if default is reached here there's an error in the translator 
(it should only get here in the explicit cases), so I'd consider 
tcg_abort() appropriate.

> > +    case 0x1e: /* LRV R1,D2(X2,B2) [RXY] */
> > +        tmp2 = tcg_temp_new_i32();
>
> Wrong TCGv type;
>
> > +        tcg_gen_qemu_ld32u(tmp2, tmp, 1);
> > +        tcg_gen_bswap32_i32(tmp2, tmp2);
>
> Wrong TCGv type;
>
> > +        store_reg(r1, tmp2);

This is more wrong than you think, because this is a 32-bit insn, so it 
shouldn't overwrite the top 32 bits. Rarely used, I guess that's how it 
could slip through.

> > +    case 0x3e: /* STRV R1,D2(X2,B2) [RXY] */
> > +        tmp2 = load_reg32(r1);
> > +        tcg_gen_bswap32_i32(tmp2, tmp2);
> > +        tcg_gen_qemu_st32(tmp2, tmp, 1);
>
> Wrong TCGv type.

What's wrong here? Everything's 32-bit.

> > +    case 0x57: /* XY R1,D2(X2,B2) [RXY] */
> > +        tmp2 = load_reg32(r1);
> > +        tmp3 = tcg_temp_new_i32();
> > +        tcg_gen_qemu_ld32u(tmp3, tmp, 1);
> > +        tcg_gen_xor_i32(tmp, tmp2, tmp3);
> > +        store_reg32(r1, tmp);
> > +        set_cc_nz_u32(tmp);
>
> Wrong TCGv type.

No need to zero-extend anyway, it's a 32-bit insn.

> > +    case 0x76: /* LB R1,D2(X2,B2) [RXY] */
> > +    case 0x77: /* LGB R1,D2(X2,B2) [RXY] */
> > +        tmp2 = tcg_temp_new_i64();
> > +        tcg_gen_qemu_ld8s(tmp2, tmp, 1);
> > +        switch (op) {
> > +        case 0x76:
> > +            tcg_gen_ext8s_i32(tmp2, tmp2);
>
> Wrong TCGv type.
>
> > +            store_reg32(r1, tmp2);
> > +            break;
> > +        case 0x77:
> > +            tcg_gen_ext8s_i64(tmp2, tmp2);
>
> Wrong TCGv type.

The way I see it, there are only 32-bit and 64-bit types, so it can only 
be wrong in one case, right?

> > +    case 0x78: /* LHY R1,D2(X2,B2) [RXY] */
> > +        tmp2 = tcg_temp_new_i32();
>
> Wrong TCGv type.
>
> > +        tcg_gen_qemu_ld16s(tmp2, tmp, 1);
> > +        tcg_gen_ext16s_i32(tmp2, tmp2);
> > +        store_reg32(r1, tmp2);
> > +        break;

Replacing that tcg_gen_qemu_ld16s() with tcg_gen_qemu_ld16() should do 
it, right?

> > +    case 0x86: /* MLG      R1,D2(X2,B2)     [RXY] */
> > +        tmp2 = tcg_temp_new_i64();
> > +        tcg_gen_qemu_ld64(tmp2, tmp, 1);
> > +        tmp = tcg_const_i32(r1);
>
> Wrong TCGv type.
>
> > +        gen_helper_mlg(tmp, tmp2);

Huh? It's a register number. There are only 16 GP registers. Fits a 
32-bit value, last time I checked. The helper also expects a 32-bit 
value.

> > +    if (tmp2) tcg_temp_free(tmp2);
> > +    if (tmp3) tcg_temp_free(tmp3);
>
> Comparison on TCGv type is not allowed.

Not even to TCGV_UNUSED?

> > +static void disas_ed(DisasContext *s, int op, int r1, int x2, int
> > b2, int d2, int r1b) +{
> > +    TCGv tmp, tmp2, tmp3 = 0;
>
> tmp should be declared as TGV_i32 here, so that the types are correct
> for the whole function. Also tmp3 should not be initialized to 0.

OK.

> > +        gen_helper_madb(tmp3, tmp2, tmp);
>
> tmp3 should be freed after the helper.

Yup.

> > +    LOG_DISAS("disas_c0: op 0x%x r1 %d i2 %d\n", op, r1, i2);
> > +    uint64_t target = s->pc + i2 * 2;
> > +    /* FIXME: huh? */ target &= 0xffffffff;
>
> That should be fixed before a merge.

Tricky, because I have not yet understood why it is necessary. According 
to the POP, the address calculation in 64-bit addressing mode should be 
64 bits. In practice, on a real machine running a real Linux kernel, it 
wraps around at 32 bits, and userland code actually relies on that. No 
clue what I could be missing here...

> > +static inline uint64_t ld_code2(uint64_t pc) { return
> > (uint64_t)lduw_code(pc); }
>
> Coding style.

OK.

> > +#if 1
> > +    cc = tcg_temp_local_new_i32();
> > +    tcg_gen_mov_i32(cc, global_cc);
> > +#else
> > +    cc = global_cc;
> > +#endif
>
> Why this?

Because it didn't work otherwise. I don't claim to understand the 
mysterious ways of the TCG to its full extent...

> > +    } while (!dc.is_jmp && gen_opc_ptr < gen_opc_end && dc.pc <
> > next_page_start && num_insns < max_insns);
>
> The also translation should be stopped if in singlestep mode
> (singlestep variable).

OK.

Phew, that was a big one. :)

Thanks for reviewing. I'll fix what makes sense to me and send a new one 
when I'm done.

CU
Uli

-- 
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
[Prev in Thread]
Current Thread
[Next in Thread]
[Qemu-devel] [PATCH 2/9] S/390 CPU emulation, (continued)
Prev by Date: [Qemu-devel] Re: [PATCH 6/9] QError: Add qdev not found error
Next by Date: Re: [Qemu-devel] [PATCH 1/9] TCG "sync" op
Previous by thread: Re: [Qemu-devel] [PATCH 2/9] S/390 CPU emulation
Next by thread: Re: [Qemu-devel] [PATCH 2/9] S/390 CPU emulation
Index(es):
- Date
- Thread